Incremental learning based on non-incremental in- duction algorithm

نویسنده

  • Sigita Misina
چکیده

The machine learning algorithms can be divided into two general types: non-incremental that processes all training examples at once and incremental that handles examples one by one. This paper describes the multi-layer incremental inference algorithm (MLII) [1] based on the non-incremental inductive inference algorithm CN2 [2]. In original, the MLII algorithm used linked with the non-incremental algorithm HCV [3], which is based on the extension matrix approach, works with symbolic variables and as a result generates conjunctive formula [3]. The MLII algorithm consists of three steps [1]: (1) data partitioning (divide initial data in subsets of approximately same size, shuffling data randomly); (2) generalization of rules learned from the first subset; and (3) reduction of a set of previously obtained rules in order to produce more accurate and consistent rules. The multi-layer incremental induction divides an initial training set into subsets of approximately equal size, runs an existing induction algorithm on the first subset to obtain a first set of rules, and then processes each of remaining data subsets [1] at a time by including the induction results from the previous subsets. In such a way, the multi-layer induction accumulates the found rules from each data subset at each layer and produces final integrated output. As layers, the data subsets are described-using layers the effects of noise would be diluted and induction efficiency can be increased [1]. The paper provides practical experiments with the MLII algorithm that consider e-mail messages classification task. Basic e-mail dataset is processed by Levenshtein Distance [4] to receive most frequently used words in e-mail Subject and Body. The learning set examples are randomly shuffled and divided into subsets using data partitioning method. From the data subsets rules are generated using free software Sipina for Windows-Research version [5] exploiting inductive inference algorithm CN2 with the following parameters: rule evaluation function-Laplacian error rate, significance level-0.001. Experiments are performed to study working principles of the MLII algorithm , to compare algorithm CN2 usage in MLII and interface agent in e-mail messages filtering task-practical experiments of what is described in

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

Distributed Incremental Least Mean-Square for Parameter Estimation using Heterogeneous Adaptive Networks in Unreliable Measurements

Adaptive networks include a set of nodes with adaptation and learning abilities for modeling various types of self-organized and complex activities encountered in the real world. This paper presents the effect of heterogeneously distributed incremental LMS algorithm with ideal links on the quality of unknown parameter estimation. In heterogeneous adaptive networks, a fraction of the nodes, defi...

متن کامل

On the effect of low-quality node observation on learning over incremental adaptive networks

In this paper, we study the impact of low-quality node on the performance of incremental least mean square (ILMS) adaptive networks. Adaptive networks involve many nodes with adaptation and learning capabilities. Low-quality mode in the performance of a node in a practical sensor network is modeled by the observation of pure noise (its observation noise) that leads to an unreliable measurement....

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Tracking performance of incremental LMS algorithm over adaptive distributed sensor networks

in this paper we focus on the tracking performance of incremental adaptive LMS algorithm in an adaptive network. For this reason we consider the unknown weight vector to be a time varying sequence. First we analyze the performance of network in tracking a time varying weight vector and then we explain the estimation of Rayleigh fading channel through a random walk model. Closed form relations a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006